AITopics | sub-optimality gap

Collaborating Authors

sub-optimality gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Reward Best Policy Identification

Neural Information Processing SystemsFeb-17-2026, 21:23:02 GMT

This bound guides the design of an optimal exploration policy attaining minimal sample complexity. However, this lower bound involves solving a hard non-convex optimization problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

73634c1dcbe056c1f7dcf5969da406c8-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 22:36:39 GMT

algorithm, arxiv preprint arxiv, task parameter, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas (0.04)
(4 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Neural Information Processing SystemsDec-24-2025, 10:38:13 GMT

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates linear realizability of the optimal Q-function (also called the ``linear $Q^{\star}$ problem''). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available, but is unfortunately infeasible when we turn to standard online RL settings. We make progress towards understanding this linear $Q^{\star}$ problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states. This protocol is more flexible than the standard online RL setting, while being practically relevant and far more restrictive than the generative model. We develop an algorithm tailored to this setting, achieving a sample complexity that scales polynomially with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space. Our findings underscore the fundamental interplay between sampling protocols and low-complexity function representation in RL.

linearly realizable mdp, name change, sample-efficient reinforcement learning, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Coordinate Descent with Bandit Sampling

Farnood Salehi, Patrick Thiran, Elisa Celis

Neural Information Processing SystemsNov-20-2025, 15:43:25 GMT

First, we find a lower bound on the amount the cost function decreases when a coordinate is updated.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

bec8b667016a73bb195b611aa1f41026-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 15:22:16 GMT

algorithm, dbmr-bpi, optimal policy, (13 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(7 more...)

Add feedback

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

Neural Information Processing SystemsOct-3-2025, 06:32:14 GMT

In this paper, we consider the problem of determining when along a training roll-out feedback from the environment is no longer beneficial, and an intervention such as resetting the agent to the initial state distribution is warranted. We show that such interventions can naturally trade off a small sub-optimality gap for a dramatic decrease in sample complexity. In particular, we focus on the reinforcement learning setting in which the agent has access to a reward signal in addition to either (a) an expert supervisor triggering the e-stop mechanism in real-time or (b) expert state-only demonstrations used to "learn" an automatic e-stop trigger.

algorithm, probability, reinforcement, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Model-based Adversarial Meta-Reinforcement Learning

Neural Information Processing SystemsOct-3-2025, 06:02:22 GMT

Meta-reinforcement learning and multi-task reinforcement learning aim to improve the sample efficiency by leveraging the shared structure within a family of tasks.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimal Stopping vs Best-of-$N$ for Inference Time Optimization

Kalayci, Yusuf, Raman, Vinod, Dughmi, Shaddin

arXiv.org Artificial IntelligenceOct-3-2025

Large language model (LLM) generation often requires balancing output quality against inference cost, especially when using multiple generations. We introduce a new framework for inference-time optimization based on the classical Pandora's Box problem. Viewing each generation as opening a costly "box" with random reward, we develop algorithms that decide when to stop generating without knowing the underlying reward distribution. Our first contribution is a UCB-style Pandora's Box algorithm, which achieves performance that is provably close to Weitzman's algorithm, the optimal strategy when the distribution is known. We further adapt this method to practical LLM settings by addressing reward scaling across prompts via a Bradley-Terry inspired transformation. This leads to an adaptive inference-time optimization method that normalizes rewards and learns stopping thresholds on the fly. Experiments on the AlpacaFarm and HH-RLHF datasets, using multiple LLM-reward model pairs, show that our adaptive strategy can obtain the same performance as non-adaptive Best-of-N sampling while requiring 15-35 percent fewer generations on average. Our results establish a principled bridge between optimal stopping theory and inference-time scaling, providing both theoretical performance bounds and practical efficiency gains for LLM deployment.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.01394

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Neural Information Processing SystemsAug-15-2025, 18:42:43 GMT

This type of MDPs is commonly referred to as linear MDPs.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis

Huang, Ruiquan, Li, Donghao, Shi, Chengshuai, Shen, Cong, Yang, Jing

arXiv.org Machine LearningJul-1-2025

This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^*|ρ)+N_1}) )$, where $\mathtt{C}(π^*|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2505.13768

Country: